5) Schema Design - From the Absolute Basics to the Advanced Level
Previous Part: Query Optimization
Now, let’s break Schema Design down from beginner to advanced so you’ll understand how to plan, build, and optimize database structures that are efficient, scalable, and easy to work with.
1. What is Schema Design?
A schema is the blueprint of a database. It defines:
-
Tables (entities)
-
Columns (attributes)
-
Data types
-
Relationships (foreign keys)
-
Constraints (rules)
Good schema design makes your queries faster, data more reliable and future changes easier.
2. Beginner Level — Core Concepts
2.1 Identify Entities
Entities are things you want to store data about.
Example (E-commerce):
-
Customers -
Products -
Orders
2.2 Determine Attributes
For each entity, list its properties.
Example (Customers table):
-
customer_id(Primary Key) -
name -
email -
phone
2.3 Choose Data Types
Use the smallest data type that fits your data.
Example:
-
INTfor IDs -
VARCHAR(255)for text -
DATEfor dates -
DECIMAL(10,2)for money
2.4 Define Primary Keys
Each table must have a Primary Key: a unique identifier.
CREATE TABLE Customers (
customer_id INT PRIMARY KEY,
name VARCHAR(100),
email VARCHAR(255)
);
2.5 Define Relationships
-
One-to-Many → One customer can place many orders.
-
Many-to-Many → Products appear in many orders, orders have many products.
Example:
CREATE TABLE Orders (
order_id INT PRIMARY KEY,
customer_id INT,
FOREIGN KEY (customer_id) REFERENCES Customers(customer_id)
);
3. Intermediate Level - Normalization & Keys
3.1 Normalization
Breaking data into multiple related tables to reduce redundancy and improve consistency.
Example: Bad (Unnormalized)
Orders(order_id, customer_name, customer_email, product_name, product_price)
Problems:
-
Customer info repeated for each order.
-
If email changes, must update many rows.
Normalized (3NF):
Customers(customer_id, name, email)
Products(product_id, name, price)
Orders(order_id, customer_id)
Order_Items(order_id, product_id, quantity)
3.2 Foreign Keys & Referential Integrity
-
Foreign keys ensure related data exists.
-
Prevents orphaned records.
FOREIGN KEY (customer_id) REFERENCES Customers(customer_id)
3.3 Constraints
-
NOT NULL→ field must have a value. -
UNIQUE→ no duplicate values allowed. -
CHECK→ restrict allowed values.
CHECK (price > 0)
3.4 Composite Keys
Primary key made of multiple columns.
Example:
Order_Items(order_id, product_id) - unique combination for each order.
4. Advanced Level - Performance & Scalability
4.1 Index Design in Schema
Plan indexes during schema creation, not after performance problems appear.
Example:
CREATE INDEX idx_customer_email ON Customers(email);
4.2 Denormalization for Speed
Sometimes, store redundant data for faster reads.
Example:
-
Store
total_amountinOrdersinstead of calculating fromOrder_Itemsevery time.
Trade-off: Faster reads but risk of inconsistency if not updated properly.
4.3 Partitioning
Split large tables into parts for faster queries.
Example:
PARTITION BY RANGE (YEAR(order_date))
4.4 Handling Many-to-Many Relationships Efficiently
Use join tables with indexes.
Example:
CREATE TABLE Order_Items (
order_id INT,
product_id INT,
quantity INT,
PRIMARY KEY (order_id, product_id)
);
4.5 Schema for Analytics vs. Transactions
-
OLTP (Online Transaction Processing) → normalized, fast writes.
-
OLAP (Online Analytical Processing) → denormalized, optimized for queries.
4.6 Versioning & Soft Deletes
Instead of deleting data, mark it as inactive:
is_deleted BOOLEAN DEFAULT FALSE
Keeps historical records for audits.
5. Example: E-commerce Schema (Balanced for Speed & Consistency)
CREATE TABLE Customers (
customer_id SERIAL PRIMARY KEY,
name VARCHAR(100) NOT NULL,
email VARCHAR(255) UNIQUE NOT NULL
);
CREATE TABLE Products (
product_id SERIAL PRIMARY KEY,
name VARCHAR(100) NOT NULL,
price DECIMAL(10,2) CHECK (price > 0)
);
CREATE TABLE Orders (
order_id SERIAL PRIMARY KEY,
customer_id INT NOT NULL,
order_date TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
total_amount DECIMAL(10,2),
FOREIGN KEY (customer_id) REFERENCES Customers(customer_id)
);
CREATE TABLE Order_Items (
order_id INT NOT NULL,
product_id INT NOT NULL,
quantity INT CHECK (quantity > 0),
PRIMARY KEY (order_id, product_id),
FOREIGN KEY (order_id) REFERENCES Orders(order_id),
FOREIGN KEY (product_id) REFERENCES Products(product_id)
);
6. Best Practices Summary Table
Schema Design Practice Quiz going from easy → hard, covering concepts, coding, and real-world optimizations.
I’ve included answers + explanations so you can check your work.
📚 Schema Design Practice Quiz
🟢 Easy Level — Fundamentals
1. Identify at least three entities for a Library Management System.
Answer:
-
Books -
Members -
Loans
(You might also include:Authors,Publishers)
Explanation: Entities are the main "things" in your domain.
2. Which data type would you use for:
a) Book title
b) Loan date
c) Member ID
Answer:
a) VARCHAR(255)
b) DATE
c) INT or BIGINT (depending on expected size)
Explanation: Choose the smallest data type that fits the data.
🟡 Medium Level — Normalization & Keys
3. This table is unnormalized:
Orders(order_id, customer_name, customer_email, product_name, product_price, quantity)
Normalize it to 3NF.
Answer:
Customers(customer_id, name, email)
Products(product_id, name, price)
Orders(order_id, customer_id, order_date)
Order_Items(order_id, product_id, quantity)
Explanation:
-
Remove repeating customer and product data.
-
Separate into related tables with foreign keys.
4. Which is better for a many-to-many relationship between Students and Courses?
a) Add multiple course_id columns to Students table
b) Create a Student_Courses table
Answer: b)
Explanation: Many-to-many relationships need a join table for flexibility and scalability.
5. Design a Order_Items table that ensures no duplicate product in the same order.
Answer:
CREATE TABLE Order_Items (
order_id INT,
product_id INT,
quantity INT CHECK (quantity > 0),
PRIMARY KEY (order_id, product_id),
FOREIGN KEY (order_id) REFERENCES Orders(order_id),
FOREIGN KEY (product_id) REFERENCES Products(product_id)
);
Explanation: Composite primary key ensures each (order_id, product_id) is unique.
🔴 Hard Level — Optimization & Scalability
6. Your Orders table has 500M rows. You mostly query last 30 days of data. Suggest two schema-level optimizations.
Answer:
-
Partition the table by
order_date(monthly/weekly). -
Create a partial index:
CREATE INDEX idx_recent_orders ON Orders(order_date)
WHERE order_date >= CURRENT_DATE - INTERVAL '30 days';
Explanation: Reduces scanned rows, improving speed.
7. The Invoices table has this query:
SELECT * FROM Invoices WHERE YEAR(invoice_date) = 2024;
Explain why it’s slow and redesign the schema/index to improve it.
Answer:
-
Wrapping
invoice_dateinYEAR()prevents index use. -
Redesign query:
SELECT *
FROM Invoices
WHERE invoice_date >= '2024-01-01'
AND invoice_date < '2025-01-01';
-
Add index:
CREATE INDEX idx_invoice_date ON Invoices(invoice_date);
Explanation: Index can now be used for range scans.
8. You have a Users table for an OLTP system and a reporting dashboard that runs slow aggregations. Suggest two schema designs to separate transactional and analytical workloads.
Answer:
-
OLTP schema → normalized, small indexes, fast writes.
-
OLAP schema → denormalized star schema with pre-aggregated data in a separate reporting table.
Explanation: Prevents heavy analytics from slowing transactions.
9. Your Products table needs to store historical price changes without losing old values. Design the schema.
Answer:
CREATE TABLE Product_Prices (
product_id INT,
price DECIMAL(10,2),
start_date DATE,
end_date DATE,
PRIMARY KEY (product_id, start_date),
FOREIGN KEY (product_id) REFERENCES Products(product_id)
);
Explanation: Use a temporal table pattern to track history.
10. You have a Logs table with billions of rows. You must keep 5 years of data but queries for recent month must be lightning fast. Suggest three schema changes.
Answer:
-
Partition by month or year.
-
Clustered index on
(log_date, id). -
Hot-cold storage: Move old data to cheaper storage but keep last month in fast storage.
Explanation: Matches storage strategy to access frequency.
✅ Key Takeaways from This Quiz:
-
Start with normalization to remove redundancy.
-
Then add indexes, partitions, and caching for performance.
-
Adjust schema differently for transactional vs analytical use cases.
-
Design with future scalability in mind.

Comments
Post a Comment